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(54) PARALLEL OPERATION DEVICE, ENTERTAINMENT DEVICE, OPERATING METHOD, 
COMPUTER PROGRAM, AND SEMICONDUCTOR DEVICE 



(57) The present invention provides a parallel arith- 
metic apparatus capable of easily performing vector in- 
ner product operations as well as efficient matrix oper- 
ations. A parallel arithmetic apparatus (100a) is provid- 
ed with pairs of registers (120a to 120d) that record ar- 
ithmetical elements to be operated and FMACs (140a 
to 1 40d) that perform sum-of-products operations based 
on the arithmetical elements recorded in these registers 
(120a to 120d). Selectors (130a, 130b) are inserted be- 
tween the register (120a) and the FMAC (140a). The 
selectors (130a, 130b) supply the arithmetical element 
recorded in the register (120a) to the FMAC (140a) dur- 
ing a matrix operation, select the registers (120a to d) 
one by one in a round-robin fashion and supply the ar- 
ithmetical element recorded in the selected register to 
the FMAC (140a) during a vector inner product opera- 
tion. 
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Description 

Technical Field 

s [0001] The present invention relates to a technology for carrying out parallel processing that uses a plurality of arith- 
metic unitsand, more particularly, to a parallel arithmetic processing technology for carrying out processing such as 
geometry processing on computer graphics at high speed. 

Background Art 

w 

[0002] Some objects to be displayed with three-dimensional computer graphics may be modeled as a set of a plurality 
of basic graphics (polygons). The vertices of a polygon are expressed by four-dimensional coordinates (x, y, z, w) using 
homogeneous coordinates. The coordinates of the polygon vertices are subjected to coordinate transformation ac- 
cording to points of view coordinates and subjected to perspective transformation, etc. according to distances. That 
' 5 is, transformation is made in such a way that farther objects appear smaller. This series of processing is called "geometry 
processing". 

[0003] There are various modes of geometry processing. For example, a matrix operation that uses a 4 x 4 trans- 
formation matrix is performed for rotation, expansion, contraction, perspective projection and parallel translation of a 
polygon. An inner product operation is carried out to determine, for example, brightness on a light-receptive surface. 

20 The matrix and inner product operations require repetitions of sum-of-products operations. 

[0004] In three-dimensional computer graphics, a processing method that uses floating-points that are conventionally 
used for high end systems is used also in the field of entertainment apparatuses for generating entertainment images 
such as video game images and the field with severe constraints on costs such as the field of portable information 
terminals. This is because processing methods that use the floating-points increase the data dynamic range and fa- 

25 cilitate programming, and are therefore suited to sophisticated processing. 

[0005] For the purpose of carrying out a matrix operation on floating-point numbers to be used for processing that 
uses the floating-points, a parallel arithmetic apparatus is available which incorporates a plurality of floating-point sum- 
of-products operators (FMACs: Floating Multiply Accumulators) and carries out matrix operations efficiently. The plu- 
rality of FMACs allows parallel arithmetic operations increasing the processing speed. 

30 [0006] Some apparatuses, such as entertainment apparatuses and personal computers, that carry out three-dimen- 
sional image processing can obtain fine and realistic three-dimensional images at high speed by means of carrying 
out the aforementioned geometry processing using such a parallel arithmetic apparatus. 

[0007] For a parallel arithmetic apparatus with, for example, four FMACs placed in parallel, the parallel arithmetic 
apparatus can easily perform matrix operations using a 4 x 4 transformation matrix as shown in the mathematical 
35 expression 1 . However, it is difficult to perform an inner product operation between a vector A (Ax, Ay, Az, Aw) and a 
vector B (Bx, By, Bz, Bw) as shown in the mathematical expression 2. 

[0008] This is because the coordinates X, Y, Z and W that are subjected to processing are independently operated 
in a one-to-one correspondence with four FMACs. 
[0009] This is explained more specifically. 

40 [0010] When a matrix operation in the mathematical expression 1 is carried out, component values corresponding 
to a single row of the transformation matrix and coordinate values of the coordinates to be transformed are fed into 
each of four FMACs as arithmetical elements to be operated. The component values of the transformation matrix and 
the coordinate values of the coordinates entered are subjected to a sum-of-products operation to perform a matrix 
operation. For example, component values (M11, M12, M13, M14) on the first row of the transformation matrix and 

43 coordinate values of the coordinates (Vx, Vy, Vz, Vw) are subjected to a sum-of-products operation to calculate "M1 1 «Vx 
+ M1 2«Vy + M1 3«Vz + M1 4«Vw". Since each of the four FMACs carries out a similar sum-of-products operation, matrix 
operations are completed efficiently. In this Specification, "•" denotes multiplication. 

[0011] When an inner product operation in the mathematical expression 2 is carried out, each of the four FMACs is 
associated with one of the component values of the components X, Y, Z and W. Therefore, Ax and Bx, Ay and By, Az 
so and Bz, and Aw and Bw are supplied to the four FMACs, respectively. Ax»Bx, AyBy, Az»Bz and AwBw are calculated 
as their respective outputs. Thus, execution of the mathematical expression 2 requires a separate adder for adding up 
the outputs of the four FMACs, which increased the scale of the circuit. 

[0012] As apparent from the above, the conventional parallel arithmetic apparatus can provide efficient processing 
of matrix operations. However, the FMACs provided in parallel alone cannot perform vector inner product operations 
55 and sometimes require an additional adder. 
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(MATHEMATICAL EXPRESSION 1) 
[0013] 



Mi 1 M12 M13 M14 
M21 Mz2 Mz3 M24 
Mai M32 M33 M34 

M41 M42 M43 M44 



Mn< V.+Mu • V» + Mi3 • V, + Mi4-Vw 
M21 • VX+M22 ■ V,+Mi3 • V, + Mj4 • Vw 

M3 1 • Vx + M 3 2 • V y +M33 • V.+M34 • Vw 
M41 • Vx+M42 • Vy+M43 ■ V. + M44 ■ V w 



(MATHEMATICAL EXPRESSION 2) 
15 [0014] 

(Ax, Ay, Az, Aw) • (Bx, By, Bz, Bw) 
20 = Ax»Bx + Ay»By + Az-Bz + Aw»Bw 

Disclosure of Invention 

[0015] It is a main object of the present invention to provide a parallel arithmetic apparatus capable of carrying out 
25 vector inner product operations easily while carrying out matrix operations as efficiently as conventional parallel arith- 
metic apparatuses. 

[001 6] A parallel arithmetic apparatus according to the present invention that solves the above-mentioned problems 
comprises multiple pairs of recording means for recording arithmetical elements to be operated and operating means 
for performing sum-of-products operations based on the arithmetical elements recorded in the recording means, where- 
30 in one of said recording means of all pairs is selected and selecting means for supplying the arithmetical elements 
recorded in the selected recording means to the operating means of the pair is inserted between the recording means 
and operating means of any one pair. 

[0017] The parallel arithmetic apparatus of the present invention can, when the selecting means selects recording 
means of the pair in which the selecting means itself is inserted, perform operations using arithmetical elements inde- 
35 pendent of each other in each pair. That is, it is possible to carry out the matrix operations similar to those in the 
conventional art. 

[001 8] On the other hand, when the selecting means selects one recording means after another from among all the 
recording means in a round-robin fashion, it is possible to perform operations using the arithmetical elements recorded 
in the recording means of each pair. That is, the parallel arithmetic apparatus of the present invention can perform 

40 inner product operations easily without the need to use other circuits such as adders. 

[0019] This parallel arithmetic apparatus can also insert temporary recording means for temporarily recording the 
arithmetical elements recorded in the recording means of a pair in which the selecting means is not inserted is inserted 
between the recording means and operating means of the pair. In this case, the selecting means is constructed in such 
a way as to supply the arithmetical elements recorded in the temporary recording means to the operating means when 

45 the recording means of the pair in which the selecting means is not inserted is selected. 

[0020] Inserting the temporary recording means eliminates the need to occupy the output ports of the recording 
means when arithmetical elements are taken in from the recording means. This allows the recording means and op- 
erating means of the pair in which the temporary recording means is inserted to perform other processing. 
[0021] In the parallel arithmetic apparatus, the recording means of all pairs record, during a matrix operation, a first 

so arithmetical element to be subjected to the matrix operation, and during a vector inner product operation, a second 
arithmetical element to be subjected to the vector inner product operation, the selecting means is constructed in such 
a way as to input the first arithmetical element from the recording means of the own pairto the operating means of the 
own pair, and during the inner product operation, in such a way as to select the recording means of all the pairs one 
by one in a round-robin fashion and supply the second arithmetical element from the selected recording means to the 

55 operating means of the own pair. 

[0022] Each of the operating means performs operations with a content independently assigned to the pair using 
the operating elements recorded in the recording means of the pair and when this parallel arithmetic apparatus is used 
for three-dimensional computer graphics, such an operation is associated with any one of components of four-dimen- 
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sional coordinates. 

[0023] Another embodiment of the present invention is a parallel arithmetic apparatus that selectively performs a 
matrix operation and vector inner product Operation, comprising a plurality of recording means for recording, during 
the matrix operation, a first arithmetical element to be subjected to the matrix operation and recording, during the inner 

s product operation, a second arithmetical element to be subjected to the inner product operation, a plurality of operating 
means forming a one-to-one correspondence with the plurality of recording means for performing, during the matrix 
operation, a sum-of-products operation by each operating means receiving the first arithmetical element recorded in 
the corresponding recording means and performing, during the inner product operation, a sum-of-products operation 
by predetermined one of the operating means receiving the second arithmetical element recorded in all the recording 

10 means and selecting means for selecting, during the matrix operation, the recording means corresponding to the pre- 
determined operating means and supplying a first arithmetical element recorded in this recording means to the prede- 
termined operating means, and selecting, during the inner product operation, the plurality of recording means one by 
one in a round-robin fashion and supplying a second arithmetical element recorded in the selected recording means 
to the predetermined operating means. 

15 [0024] In such a parallel arithmetic apparatus, the operating means is constructed so as to carry out a sum-of- 
products operation on the floating-point numbers when, for example, the arithmetical elements are expressed with 
floating-point numbers. 

[0025] The entertainment apparatus according to the present invention is an entertainment apparatus that performs 
image processing on an entertainment image by performing a matrix operation with regard to coordinates expressing 

20 a position and a shape of an object and performing an inner product operation with regard to vectors used to express 
an image of the object, comprising a plurality of registers that records, during the matrix operation, a first arithmetical 
element subjected to the matrix operation and records, during the inner product operation, a second arithmetical ele- 
ment subjected to the inner product operation, a plurality of sum-of-products operators forming a one-to-one corre- 
spondence with the plurality of registers that performs, during the matrix operation, a sum-of-products operation by 

25 each sum-of-products operator receiving the first arithmetical element recorded in the corresponding registers, and 
performs, during the inner product operation, a sum-of-products operation by predetermined one of the sum-of-products 
operators receiving the second arithmetical element recorded in all registers and a selector that selects, during the 
matrix operation, a register corresponding to the predetermined sum-of-products operator and supplies a first arith- 
metical element recorded in this register to the predetermined sum-of-products operator, and selects, during the inner 

30 product operation, the plurality of registers one by one in a round-robin fashion and supplies a second arithmetical 
element recorded in the selected register to the predetermined sum-of-products operator. 

[0026] Another entertainment apparatus of the present invention is an entertainment apparatus that performs image 
processing on an entertainment image by carrying out a matrix operation between a matrix and coordinate values to 
perform a coordinate transformation of coordinates expressing the position and a shape of an object and carrying out 

35 an inner product operation between a normal vector oriented in the normal direction of the surface of the object and 
position vector of a light source to determine the display mode of the surface of the object, comprising a plurality of 
registers that records the coordinate values and component values corresponding to any one row of the matrix during 
the matrix operation and records the normal vector and component values corresponding to any one component of 
the position vector during the inner product operation, a sum-of-products operators forming a one-to-one correspond- 

40 ence with the plurality of registers that carries out a sum-of-products operation during the matrix operation by each 
sum-of-products receiving the coordinate values recorded in the corresponding register and component values corre- 
sponding to the one row of the matrix, and carry out a sum-of-products operation during the inner product operation 
by predetermined one of the sum-of-products operators receiving the normal vector recorded in all registers and com- 
ponent values of the position vector, a selector that selects, during the matrix operation, a register corresponding to 

45 the predetermined sum-of products operator and supplies the coordinate value recorded in this register and component 
values corresponding to the one row of the matrix to the predetermined sum-of-products operator, and selects, during 
the inner product operation, the plurality of registers one by one in a round-robin fashion and supplies component 
values of the normal vector and the position vector recorded in the selected register in the predetermined sum-of- 
products operator. 

so [0027] The processing method according to the present invention is a processing method that allows a matrix oper- 
ation and vector inner product operation to be selectively executed and is executed by an apparatus provided with a 
plurality of operating means, comprising the steps of supplying, during the matrix operation, arithmetical elements 
subjected to the matrix operation by assigning the arithmetical elements to the plurality of operating means based on 
the features thereof to carry out a sum-of-products operation based on the assigned arithmetical elements and sup- 

55 plying, during the inner product operation, arithmetical elements subjected to the inner product operation in one pre- 
determined operating means to allow the operating means to carry out a sum-of-products operation based on the 
arithmetical elements. 

[0028] The computer program according to the present invention is a computer program that makes it possible to 
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selectively execute a matrix operation and vector inner product operation and renders a computer provided with a 
plurality of operating means to execute a step of supplying, during the matrix operation, arithmetical elements subjected 
to the matrix operation by assigning the arithmetical elements to the plurality of operating means based on the features 
thereof to carry out a sum-of-products operation based on the assigned arithmetical elements and supplying, during 
5 the inner product operation, arithmetical elements subjected to the inner product operation in one predetermined op- 
erating means to allow the operating means to carry out a sum-of-products operation based on the arithmetical ele- 
ments. 

[0029] The semiconductor device according to the present invention is a semiconductor device that makes it possible 
to selectively execute a matrix operation and vector inner product operation and is built in an apparatus incorporating 

'0 a computer provided with a plurality of operating means, rendering the apparatus to execute a step of supplying, during 
the matrix operation, arithmetical elements subjected to the matrix operation by assigning the arithmetical elements 
to the plurality of operating means based on the features thereof to allow each operating means to carry out a sum- 
of-products operation based on the assigned arithmetical elements and supplying, during the inner product operation, 
arithmetical elements subjected to the inner product operation in one predetermined operating means to allow the 

is operating means to carry out a sum-of-products operation based on the arithmetical elements. 



Brief Description of Drawings 
[0030] 

FIG. 1 is a block diagram of an entertainment apparatus; 
FIG. 2 is a block diagram of a parallel arithmetic apparatus; 
FIG. 3 is an internal block diagram of an FMAC; 

FIG. 4 is a flow chart showing a procedure for inner product operation processing; and 
FIG. 5 is a block diagram of a parallel arithmetic apparatus. 



Best Mode for Carrying Out the Invention 



[0031 ] Embodiments of the present invention are specifically described with reference to the drawings accompanying 
30 herewith. 

[0032] FIG. 1 is a view illustrating a configuration example of an entertainment apparatus including a parallel arith- 
metic apparatus according to the present invention. 

[0033] This entertainment apparatus 1 comprises two buses, a main bus B1 and a sub bus B2, to which a plurality 
of semiconductor devices each having a specific function are connected. These buses B1 and B2 are connected to or 

35 disconnected from each other through a bus interface INT. 

[0034] The main bus B1 is connected to a main CPU 1 0 which is a main semiconductor device, a main memory 11 
that is made up of a RAM, a main DMAC (Direct Memory Access Controller) 12, an MPEG (Moving Picture Experts 
Group) decoder (MDEC) 13 and a graphic processing unit (hereinafter referred to as "GPU") 14 having a built-in frame 
memory 1 5 which serves as a drawing memory. The GPU 1 4 is connected to a CRTC (CRT Controller) 1 6 for generating 

40 a video output signal so that the data drawn in the frame memory 15 can be displayed on a display apparatus which 
is not shown. 

[0035] The main CPU 1 0 loads a bootstrap program from the ROM 23 on the sub bus B2 through the bus interface 
INT upon activation of the entertainment apparatus 1 , and executes the bootstrap program to start an operating system. 
It controls the media drive 27, reads an application program or data from the medium 28 mounted in this media drive 

45 27, and stores this in the main memory 11. It performs the above-described geometry processing for various data read 
from the medium 28, such as three-dimensional object data (coordinate values of the vertices (representative points) 
of polygons) made up of a plurality of basic graphics (the polygons). Then, it generates a display list containing as its 
contents polygon definition information (specifications of a shape of the polygon used, its drawing position, type, color 
or texture of components of the polygon), as a result of the geometry processing. 

so [0036] A parallel arithmetic apparatus 100 is included in the main CPU 10 and is used for the geometry processing, 
etc. Details of the parallel arithmetic apparatus 100 will be described later. 

[0037] The GPU 1 4 is a semiconductor device having the functions of storing drawing context (drawing data including 
polygon components), carrying out rendering processing (drawing processing) by reading drawing context according 
to the display list notified from the main CPU 10 and drawing polygons in the frame memory 15. The frame memory 
55 15 can also be used as a texture memory. Thus, a pixel image in the frame memory 15 can be pasted as texture to a 
polygon to be drawn. 

[0038] The main DMAC 12 is a semiconductor device that controls not only DMA transfer to each circuit connected 
to the main bus B1 , but also DMA transfer to each circuit connected to the sub bus B2 according to the state of the 
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bus interface INT. The MDEC 13 is a semiconductor device that operates in parallel with the main CPU 10 and has 
the function of decompressing data compressed, for example, in the MPEG (Moving Picture Experts Group) or JPEG 
(Joint Photographic Experts Group) format. 

[0039] The sub bus B2 is connected to a sub CPU 20 including a microprocessor, a sub memory 21 made up of a 
5 RAM , a sub DMAC 22, a ROM 23 on which a control program such as an operating system is stored, a sound processing 
semiconductor device (SPU: Sound Processing Unit) 24 that reads sound data stored in the sound memory 25 and 
produces them as an audio output, a communication control section (ATM) 26 that exchanges information with an 
external apparatus via a network which is not shown, a media drive 27 in which a medium 28, such as a CD-ROM and 
a DVD-ROM, can be loaded, and an input device 31 . 
10 [0040] The sub CPU 20 carries out various operations according to the control program stored in the ROM 23. The 
sub DMAC 22 is a semiconductor device that controls operations such as DMA transfer to each circuit connected to 
the sub bus B2 only when the bus interface INT disconnects the main bus B1 from sub bus B2. The input device 31 
is provided with a connection terminal 32 through which an input signal from an operating device 33 is supplied. 
[0041] The entertainment apparatus 1 with such a configuration can carry out matrix operations and inner product 
15 operations at high speed that are performed during the geometry processing through the parallel arithmetic apparatus 
100 included in the main CPU 10, which will be described below. 

[0042] The parallel arithmetic apparatus 100 executes at high speed a matrix operation between a transformation 
matrix and vertex coordinate values that are performed when the coordinates of polygon vertices are transformed and 
an inner product operation between a normal vector oriented in the normal direction of the surface and a position vector 
20 of a light source carried out when a display condition such as brightness of the surface of an object is determined. 

<Embodiment 1> 

[0043] FIG. 2 shows a configuration example of the parallel arithmetic apparatus 1 00 included in the main CPU 1 0. 
25 [0044] This parallel arithmetic apparatus 100a acquires coordinate values of polygon vertices and data (arithmetical 
elements) that are necessary for the geometry processing such as a transformation matrix used for matrix operations 
from the main memory 11 via the main bus B1 and carries out the operations. 

[0045] The parallel arithmetic apparatus 100a is configured from a control circuit 110, registers 120a to 120d, selec- 
tors 130a and 130b, FMACs 140a to 140d as arithmetic units, and an internal storage device 150. The registers 120a 

30 to 1 20d and the internal storage device 1 50 are connected via the internal bus B. 

[0046] The registers 120a to 120d each form a pair with the FMACs 140a to 140d, that is, the registers are designed 
to have a one-to-one correspondence with the FMACs. To allow matrix operations using a 4 x 4 transformation matrix 
and inner product operations of four-dimensional vectors, this embodiment uses four pairs of registers and FMACs, 
but the number of pairs can be determined according to the processing, as appropriate. 

35 [0047] The selectors 130a and 130b are provided between the register 120a and the FMAC 140a. 

[0048] In this embodiment, the arithmetical elements used for the matrix operations and the inner product operations 
are given by using floating-point numbers, but it goes without saying that fixed-point numbers can also be used instead. 
When the arithmetical elements are expressed with the fixed-point numbers, sum-of-products operators for the fixed- 
point numbers will be used rather than the FMACs 140a to 140d. 

40 [0049] The control circuit 110 controls the overall operation of the parallel arithmetic apparatus 100a. For example, 
the control circuit controls the internal storage device 150, the recording of arithmetical elements in the registers 120a 
to 120d and the operations of the selectors 130a and 130b. 

[0050] The registers 120a to 120d take in and record the arithmetical elements assigned to the respective registers 
from among the arithmetical elements such as component values of a transformation matrix used for operations such 

45 as matrix operations or inner product operations, coordinate values of coordinates to be transformed and vector com- 
ponent values from the internal storage device 150 under the control of the control circuit 110. 
[0051] When an inner product operation of four-dimensional vectors is carried out, the registers take in and record 
component values assigned to the respective registers as arithmetical elements from among component values of two 
four-dimensional vectors. For example, of the two four-dimensional vectors (Ax, Ay, Az, Aw) and (Bx, By, Bz, Bw), the 

so register 120a records components values Ax and Bx, the register 120b records components values Ay and By, the 
register 120c records components values Az and Bz and the register 120d records components values Aw and Bw. 
[0052] When a matrix operation is carried out using a 4 x 4 transformation matrix, the registers take in and record, 
as arithmetical elements, the coordinate values of the four-dimensional coordinates to be transformed and component 
values of a row assigned to the respective registers of the transformation matrix. For example, the registers 120a to 

55 I20d record component values of the transformation matrix in addition to coordinate values of the four-dimensional 
coordinates; the register 120a records the component values of the 1st row of the transformation matrix, the register 
1 20b records the component values of the 2nd row of the transformation matrix, the register 1 20c records the compo- 
nent values of the 3rd row of the transformation matrix and the register 120d records the component values of the 4th 
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row of the transformation matrix as their respective arithmetical elements. The registers 120a to 120d each record a 
pair of the 1st column component value of each row of the transformation matrix and the 1st component value of the 
four-dimensional coordinate to be transformed, a pair of the 2nd column component value and the 2nd component 
value, a pair of the 3rd column component value and the 3rd component value and a pair of the 4th column component 
5 value and the 4th component value, and these values are read one pair at a time. 

[0053] Furthermore, the registers 120a to 120d record calculation results of the FMACs 140a to 120d each forming 
a pair therewith. 

[0054] The selectors 130a and 130b select one of the registers 120a to 120d, take in an arithmetical element to be 
recorded in the selected register and supply it to the FMAC 140a. When an inner product operation is carried out, the 
10 selectors select one of the registers 120a to 120d in a round-robin fashion, take in an arithmetical elementtobe recorded 
in the selected register and supply it to the FMAC 140a. When a matrix operation is carried out, the selectors always 
select the register 120a and take in the arithmetical element recorded in the register 120a and supply it to the FMAC 
140a. 

[0055] The selectors 1 30a and 1 30b select a register indicated by the control circuit 1 1 0 based on the content of an 
15 operation carried out at that time and the situation of progress of the operation, etc. 

[0056] The FMACs 140a to 140d take in two arithmetical elements recorded in the registers 120a to 120d and multiply 
and add up them. 

[0057] FIG. 3 is an internal block diagram of the FMAC 140a. Since the other FMACs 140b to 140d also have a 
similar configuration, only the configuration of the FMAC 140a is described here and description of the other FMACs 

20 140b to 140d will be omitted. 

[0058] In order to multiply and add up the arithmetical elements taken in, a floating-point number multiplier (FMUL: 
Floating MULtiply) 141 and a floating-point number adder (FADD: Floating ADDer) 142 are provided. The two arith- 
metical elements that are taken in are multiplied by the FMUL 141 first. The multiplication result is sent to the FADD 
142. The FADD 142 adds up the multiplication results sent from the FMUL 141 one by one. 

25 [0059] For example, when aO to an and bO to bn are taken in one after another as the arithmetical elements, the 
FMAC 140a obtains the following calculation result: 

a0«b0 + a1»b1 + a2»b2 + ... + a(n - 1) »b(n - 1) + an«bn 

30 

[0060] The FMACs 140a to 140d supply the calculation results to the registers that form their respective pairs. 
[0061] Using the selectors 130a and 130b, the FMACs 140a to 140d perform the following operations during the 
inner product and matrix operations. 

[0062] When an inner product operation is carried out, the FMAC 140a multiplies the component values of thecom- 
35 ponents of two vectors supplied from the registers 120a to 120d via the selectors 130a and 130b and adds up the 
multiplication results one by one. The number of times the multiplication and addition are performed may be counted, 
making the state of the progress of the inner product operation available to prevent the subsequent instruction from 
starting until the inner product operation is completed. 

[0063] When a matrix operation is carried out, the FMACs 140a to 140d multiply the component values of thetrans- 
40 formation matrix taken in from the corresponding registers 120a to 120d by coordinate values of the four-dimensional 
coordinates which form pairs and add up the multiplication results one by one. 

[0064] The internal storage device 150 takes in coordinate values of polygon vertices, component values of the 
transformation matrix used for the matrix operations, data necessary for the geometry processing of the vector com- 
ponent values from the main memory 1 1 and records these values under the control of the control circuit 110. Further- 
45 more, the internal storage device 150 takes in and records the calculation results from the registers 120a to 120d. The 
calculation results are sent to the main memory 11 via the internal storage device 150. 

[0065] A direct memory access transfer is performed between the internal storage device 150 and the main memory 
11 , which allows high speed data exchange and is convenient for processing of that requires a large volume of data 
processing, such as image processing. 

so [0066] The processing procedure when the parallel arithmetic apparatus 100a carries out the inner product operation 
in the mathematical expression 2, that is, the inner product operation between the vector A (Ax, Ay, Az, Aw) and the 
vector B (Bx, By, Bz, Bw) is described. FIG. 4 is a flow chart showing such a processing procedure. 
[0067] The parallel arithmetic apparatus 1 00a takes in the component values of the vector A (Ax, Ay, Az, Aw) and 
the vector B (Bx, By, Bz, Bw) stored in the main memory 11 through the direct memory access transfer and records 

55 the component values in the internal storage device 150 (step S101). 

[0068] The registers 120a to 120d take in the component values assigned to the respective registers from among 
the component values of the vector A (Ax, Ay, Az, Aw) and the vector B (Bx, By, Bz, Bw) stored in the internal storage 
device 150. That is, the register 120a takes in Ax and Bx, the register 120b takes in Ay and By, the register 1 20c takes 
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in Az and Bz and the register 120d takes in Aw and Bw (step S102). 

[0069] The selectors 1 30a and 1 30b select one of the registers 1 20a to 1 20d, take in the component values of the 
vector A and the vector B to be recorded in the selected register and supply them to the FMAC 1 40a. The control circuit 
110 determines which of the registers 120a to 120d should be selected according to the state of the progress of the 
5 inner product operation. The selectors 130a and 130b select one of the registers 120a to 120d under the control of the 
control circuit 110. Here, the selectors first select the register 120a, take in Ax and Bx and supply them to the FMAC 
140a (step S103). The FMAC 140a performs a sum-of-pro ducts operation between Ax and Bx using the FMUL 141 
and the FADD 142 (stepS104). Before the first sum-of-products operation is carried out, the internal state of the FMAC 
140a is cleared. 

10 [0070] After the sum-of-products operation, the FMAC 140a determines whether the inner product operation has 
been completed or not (step S1 05). Whether the inner product operation has been completed or not can be determined 
by understanding the number of the component values of the vectors subjected to the inner product operation. The 
number of times the sum-of products operations are performed is counted and it is when the count value becomes 
equal to the number of the component values of the received vectors that it is determined that the inner product operation 

is has been completed. This makes it possible to find out from the count value the register from which the subsequent 
component value should be extracted. The result of the determination as to whether the inner product operation has 
been completed or not is sent to the control circuit 110. 

[0071] In this case, the inner product operation has not been completed yet (step S1 05: N), and therefore the control 
circuit 110 allows the selectors 130a and130b to select the register 120b. The selectors 130a and130b select the 
20 register 120b under the control of the control circuit 110, take in Ay and By and supply them to the FMAC 140a. When 
the FMAC 140a takes in Ay and By, the FMUL 141 and the FADD 142 perform a sum-of-products operation to obtain 
Ax«Bx + AyBy. Likewise, the steps S103 to S105 are repeated until the inner product operations are completed to 
obtain Ax»Bx + AyBy + Az«Bz + AwBw. 

[0072] Upon determining that the inner product operations have been completed (step S105: Y), the FMAC 140a 
25 supplies the calculation result to the register 120a (step S106). After the output, the FMAC 140a clears the internal 
state (step S107). The output calculation result is supplied from the register 120a to the internal storage device 150 
and sent to the main memory 11 . 
[0073] This completes the inner product operations. 

[0074] Tthe selectors 130a and 130b allows calculations between the component values of different components, 
30 making it easier to carry out inner product operations. The selectors 130a and 130b are provided between the register 
120a and the FMAC 140a, which is not intended to be restrictive. The selectors may also be provided between the 
register 120b and the FMAC 140b, between the register 120c and the FMAC 140c or between the register 120d and 
the FMAC 140d. 

[0075] When a matrix operation is performed, the selectors 130a and 130b always select the register 120a, only 
35 supply the arithmetical element recorded in the register 120a to the FMAC 140a and never supply the arithmetical 
elements recorded in the other registers 120b to d to the FMAC 140a. The arithmetical elements recorded in the 
registers 120b to 120d are taken into the FMACs 140b to 140d with which the FMACs 1 40b to 140d form their respective 
pairs and processed. 

[0076] For example, when the matrix operation in the mathematical expression 1 is carried out, the register 120a 
40 records the component values (M11 , M12, M13, M14) of the 1st row of the transformation matrix and the coordinate 
values (Vx, Vy, Vz, Vw) of the four-dimensional coordinates. The register 120b records the component values (M21 , 
M22, M23, M24) of the 2nd row of the transformation matrix and the coordinate values (Vx, Vy, Vz, Vw) of the four- 
dimensional coordinates. The register 120c records the component values (M31, M32, M33, M34) of the 3rd row of 
the transformation matrix and the coordinate values (Vx, Vy, Vz, Vw) of the four-dimensional coordinates. The register 
45 I20d records the component values (M41 , M42, M43, M44) of the 4th row of the transformation matrix and the coor- 
dinate values (Vx, Vy, Vz, Vw) of the four-dimensional coordinates. 

[0077] The FMACs 140a to 140d sequentially take in the component values and the coordinate values recorded in 
the registers 120a to 120d with which the FMACs form their respective pairs and carry out operations. Suppose the 
FMAC 140a is taken as an example, the FMAC 140a takes in M11 and Vx from the register 120a via the selectors 

so 130a and 130b and calculates M11«Vx using the FMUL 141. They sends this to the FADD 142. Then, they takes in 
M12 and Vy and calculates M12«Vy, sends this to the FADD 142 and calculates M11«Vx + M12»Vy. Then, They carries 
out the same calculation on M13 and Vz, and M14 and Vw and calculates M11«Vx + M12-Vy + M13-Vz + M14«Vw. The 
other FMACs 140b to 140d carry out the same operations. Thus, the FMACs 140a to 140d carry out operations in 
parallel executing thereby 4X4 matrix operations at the same speed as the conventional art. 

55 [0078] As described above, the parallel arithmetic apparatus 100a is an apparatus that selectively carries out the 
matrix operations and the vector inner product operations. The parallel arithmetic apparatus is provided with at least 
the registers 1 20a to 1 20d that record the component values of a transformation matrix as the arithmetical elements 
during the matrix operation and record the vector component values as arithmetical elements during the inner product 
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operation, the FMACs 140a to 140d that take in the arithmetical elements recorded in the registers 120a to 120d and 
carry out sum-of-products operations, the selectors 130a and 130b that select one register from the registers 120a to 
1 20d and supply the arithmetical elements registered in the selected register to the FMAC 1 40a. The registers 1 20b 
to 120d form a one-to-one correspondence with the FMACs 140b to 140d. The selectors 130a and 130b supply the 
component values of the transformation matrix recorded in the register 120a to the FMAC 140a during the matrix 
operation and select the registers 120a to 120d one by one in a round-robin fashion and supply the vector component 
value recorded in the selected register to the FMAC 140a during the inner product operation. 
[0079] As apparent from the above, the selectors 130a and 130b make it possible to carry out the matrix operation 
and the inner product operation selectively. 

<Embodiment 2> 

[0080] FIG. 5 is a block diagram of a parallel arithmetic apparatus 100b according to another embodiment. 
[0081] Compared to the parallel arithmetic apparatus 1 00a shown in FIG. 2, the parallel arithmetic apparatus 100b 
is only different in that temporary registers 1 60b to 1 60d are provided at the output ends of the registers 1 20b to 1 20d. 
[0082] This parallel arithmetic apparatus 100b is configured with registers 120a to 120d that record arithmetical 
elements, and the FMACs 1 40a to 1 40d that carry out sum-of-products operations based on the arithmetical elements 
recorded in these registers 120a to 120d. Selectors 130a and 130b are inserted between the register 120a and the 
FMAC 1 40a and the temporary registers 1 60b to 1 60d are inserted between the registers 1 20b to 1 20d and the FMAC 
140b to 140d. The selectors 130a and 130b select one from among the register 120a and the temporary registers 1 60b 
to 160d and supplies the arithmetical element recorded in the selected register 120a or temporary register 160b to 
160d to the FMAC 140a. Operations of these components are controlled by the control circuit 110. 
[0083] The temporary registers 160b to 160d have a one-to-one correspondence with the registers 120b to 120d. 
The temporary registers 1 60b to 1 60d temporarily store the arithmetical elements recorded in their respective registers 
120b to 120d when these are sent to the FMAC 140b to 140d or the selectors 130a and 130b. 
[0084] Since the temporary registers 160b to 160d temporarily record the arithmetical elements from the registers 
120b to 120d, even if the arithmetical elements are not taken from the registers 120b to 120d into the FMAC 140a at 
the same timing as in the case of the inner product operation, the read ports of the registers 120b to 120d are not 
occupied by the arithmetical elements for inner product operations. Thus, while the FMAC 1 40a is carrying out a matrix 
operation , the other FMAC 1 40b to d can take in the subsequent arithmetical elements from the registers 1 20b to 1 20d, 
allowing a sum-of-products operation. 

[0085] In the above-described embodiments, the entertainment apparatus using the parallel arithmetic apparatus 
1 00 is described as an example, which is not restrictive. The parallel arithmetic apparatus of the present invention may 
use any information processor which carries out parallel arithmetic processing and carries out at least matrix operations 
and vector inner product operations. Moreover, the number of pairs of the registers and the sum-of-product operators 
(FMAC) is not limited to 4. The number of pairs may be determined according to the processing carried out by the 
relevant apparatus. 

[0086] Furthermore, the parallel arithmetic apparatus 100 can also be implemented by rendering a computer to 
execute the computer program of the present invention. This embodiment forms functional blocks corresponding to 
the selectors 130a and 130b on the computer with a plurality of FMACs through a co-operation between the computer 
program recorded in a computer-accessible recording medium such as a disk device or semiconductor memory and 
a control program (OS, etc.) incorporated in the computer. 

[0087] As described above, the present invention can perform the vector inner product operations easily while per- 
forming the matrix operations as efficiently as the conventional arts. 



Claims 

1. A parallel arithmetic apparatus comprising multiple pairs of recording means for recording arithmetical elements 
to be operated and operating means for performing sum-of-products operations based on the arithmetical elements 
recorded in said recording means, wherein 

one of said recording means of all pairs is selected and selecting means for supplying said arithmetical 
elements recorded in the selected recording means to the operating means of said pair is inserted between the 
recording means and operating means of any one pair. 

2. The parallel arithmetic apparatus according to claim 1 , wherein temporary recording means for temporarily record- 
ing said arithmetical elements recorded in the recording means of a pair in which said selecting means is not 
inserted is inserted between the recording means and operating means of said pair, and 
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said selecting means is constructed in such a way as to supply the arithmetical elements recorded in said 
temporary recording means to said operating means when the recording means of the pair in which said selecting 
means is not inserted is selected. 

The parallel arithmetic apparatus according to claim 1 , wherein said recording means of all pairs record, during a 
matrix operation, a first arithmetical element to be subjected to said matrix operation, and during a vector inner 
product operation, a second arithmetical element to be subjected to said inner product operation, 

said selecting means is constructed, during said matrix operation, in such a way as to input said first arith- 
metical element from the recording means of the own pair to the operating means of the own pair and, during said 
inner product operation, in such a way as to select said recording means of all pairs one by one in a round-robin 
fashion and supply said second arithmetical element from the selected recording means to the operating means 
of the own pair. 

The parallel arithmetic apparatus according to claim 1 , wherein each of said operating means performs an operation 
with a content independently assigned to said pair using said arithmetical elements recorded in the recording 
means of said pair. 

The parallel arithmetic apparatus according to claim 4, wherein said operation is an operation associated with any 
one of four-dimensional coordinate components. 

A parallel arithmetic apparatus that selectively performs a matrix operation and vector inner product operation, 
comprising: 

a plurality of recording means for recording, during said matrix operation, a first arithmetical element to be 
subjected to said matrix operation and recording, during said inner product operation, a second arithmetical 
element to be subjected to said inner product operation; 

a plurality of operating means forming a one-to-one correspondence with said plurality of recording means for 
performing, during said matrix operation, a sum-of-products operation by each operating means receiving said 
first arithmetical element recorded in the corresponding recording means, and performing, during said inner 
product operation, a sum-of-products operation by predetermined one of the operating means receiving said 
second arithmetical element recorded in all the recording means; and 

selecting means for selecting, during said matrix operation, the recording means corresponding to said pre- 
determined operating means and supplying a first arithmetical element recorded in this recording means to 
said predetermined operating means, and selecting, during said inner product operation, said plurality of re- 
cording means one by one in a round-robin fashion and supplying a second arithmetical element recorded in 
the selected recording means to said predetermined operating means. 

The parallel arithmetic apparatus according to claim 6, wherein said arithmetical element is expressed with a 
floating point number and said operating means is constructed so as to perform a sum-of products operation of 
the floating point number. 

An entertainment apparatus that performs image processing on an entertainment image by performing a matrix 
operation with regard to coordinates expressing a position and a shape of an object and performing an inner 
product operation with regard to vectors used to express an image of said object, comprising: 

a plurality of registers that records, during said matrix operation, a first arithmetical element subjected to said 
matrix operation and records, during said inner product operation, a second arithmetical element subjected 
to said inner product operation; 

a plurality of sum-of-products operators forming a one-to-one correspondence with said plurality of registers 
that performs, during said matrix operation, a sum-of-products operation by each sum-of-products operator 
receiving said first arithmetical element recorded in the corresponding register, and performs, during said inner 
product operation, a sum-of-products operation by predetermined one of the sum-of-products operators re- 
ceiving said second arithmetical element recorded in all registers; and 

a selector that selects, during said matrix operation, a register corresponding to said predetermined sum-of- 
products operator and supplies a first arithmetical element recorded in this register to said predetermined sum- 
of-products operator, and selects, during said inner product operation, said plurality of registers one by one 
in a round-robin fashion and supplies a second arithmetical element recorded in the selected register to said 
predetermined sum-of-products operator. 
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An entertainment apparatus that performs image processing on an entertainment image by carrying out a matrix 
operation between a matrix and coordinate values to perform acoordinate transformation of coordinates expressing 
the position and a shape of an object and carrying out an inner product operation between a normal vector oriented 
in the normal direction of the surface of said object and position vector of a light source to determine the display 
mode of the surface of said object, comprising: 

a plurality of registers that records said coordinate values and component values corresponding to any one 
row of said matrix during said matrix operation and records said normal vector and component values corre- 
sponding to any one component of said position vector during said inner product operation; 
sum-of-products operators forming a one-to-one correspondence with said plurality of registers that carry out 
a sum-of-products operation during said matrix operation by each sum-of-products operator receiving said 
coordinate values recorded in the corresponding register and component values corresponding to said one 
row of said matrix, and carry out a sum-of-products operation during said inner product operation by prede- 
termined one of the sum-of-products operators receiving said normal vector recorded in all registers and com- 
ponent values of said position vector; 

a selector that selects, during said matrix operation, a register corresponding to said predetermined sum-of- 
products operator and supplies said coordinate value recorded in this register and component values corre- 
sponding to said one row of said matrix to said predetermined sum-of-products operator, and selects, during 
said inner product operation, said plurality of registers one by one in a round-robin fashion and supplies com- 
ponent values of said normal vector and said position vector recorded in the selected register in said prede- 
termined sum-of-product operator. 

A processing method that allows a matrix operation and vector inner product operation to be selectively executed 
and is executed by an apparatus provided with a plurality of operating means, comprising the steps of: 

supplying, during said matrix operation, arithmetical elements subjected to said matrix operation by assigning 
the arithmetical elements to said plurality of operating means based on the features thereof to carry out a sum- 
of-products operation based on the assigned arithmetical elements; and 

supplying, during said inner product operation, arithmetical elements subjected to said inner product operation 
in one predetermined operating means to allow said operating means to carry out a sum-of-products operation 
based on the arithmetical elements, 

A computer program that that makes it possible to selectively execute a matrix operation and vector inner product 
operation and renders a computer provided with a plurality of operating means to execute: 

a step of supplying, during said matrix operation, arithmetical elements subjected to said matrix operation by 
assigning the arithmetical elements to said plurality of operating means based on the features thereof to carry 
out a sum-of-products operation based on the assigned arithmetical elements; and 

a step of supplying, during said inner product operation, arithmetical elements subjected to said inner product 
operation in one predetermined operating means to allow said operating means to carry out a sum-of-products 
operation based on the arithmetical elements. 

A semiconductor device that makes it possible to selectively execute a matrix operation and vector inner product 
operation and is built in an apparatus incorporating a computer provided with a plurality of operating means, ren- 
dering said apparatus to execute: 

a step of supplying, during said matrix operation, arithmetical elements subjected to said matrix operation by 
assigning the arithmetical elements to said plurality of operating means based on the features thereof to allow 
each operating means to carry out a sum-of-products operation based on the assigned arithmetical elements; 
and 

a step of supplying, during said inner product operation, arithmetical elements subjected to said inner product 
operation in one predetermined operating means to allow said operating means to carry out a sum-of-products 
operation based on the arithmetical elements. 
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